Readme

This the notebook is for Li Bin Song's MM805 Assignment.

The source code is located at github. Or you can unzip the zip from eclass.

Requirement to run this notebook

The commands shows here are based on Linux(Ubuntu)

  1. Python 3.8
  2. Setup Python virtual environment
    python3.8 -m venv .venv
     . .venv/bin/activate
     pip install --upgrade pip
    
  3. Install required packages pip install -r requirements.txt
  4. Open report.ipynb , select python kernel and run

Q1. (40 points) Feature extraction and matching

1a. Harris Corner Point detection

The feature I implemented is Harris corner point. The logic is based on wikipedia

Answer:

The key function is get_harris_points(some of the code is reused from my last term 811 assignment), the detail code can be found in assignment_code.py ane extra function display_corner_points is written for display finding point on image as red point.

The main logic are:

  1. convert image to grayscale color
  2. calculate Ix, Iy spatial derivative
  3. structure Matrix A
  4. Harris response calculation(calcualte Det(A) and Trace(A))
  5. Based on the response choose corner points

1B. Feature Matching

Question: Implement a simple feature matching by using two feature descriptors of your choice (you can use the available feature descriptors in OpenCV or Matlab). Compare the two feature descriptors and the matching results on a few different images.

Answer:

The matching function is usign K near neighbour algorithm. The function accept descriptor name and display the result.

The first descriptor ORB feature descriptor and the second one is SIFT feature descriptor.

The SIFT gives better result. From the output you can see SIFT detects more good key points.

Accredit: the ploting function has refereced some logic from opencv example code.

1C Feature tracking

Question: (10 points) Instead of finding feature points independently in multiple images and then matching them, find features in the first image of a video or image sequence and then re-locate the corresponding points in the next frames using search and gradient descent (Shi and Tomasi 1994). When the number of tracked points drops below a threshold or new regions in the image become visible, find additional points to track.

Answer:
The code is included below, here are some comments to gudie you through.

function get_pre_points will generate good features to track

function get_new_points will search new frame based on pre_points

function feature_match_by_tracking is main function, the input is image folder, which includes video frames download from internet. This can be changed to mp4, but for testing purpose some images frames are good enough. In this function, there is a logic

if m < min_threshold:
            # todo recalculate points
            # recalulate pre_points
            print(f"good points is lower than {min_threshold}, recalculating...")
            (pre_points, kp1) = get_pre_points(pre_gray,n)
            (new_points,kp2,matches) = get_new_points(pre_gray,new_gray,pre_points)

This logic will check good points left, if it is lower than threshold, will re-retrieve pre_points from good features to track.

The output shows current frame idex, good features left information and matching lines between the previous frame and current frame. The frame 36 shows the result it triggered the re-retrieve pre_points action.

Question 2

Q2. (40 points) Optical flow: (use the motion sequences available at https://vision.middlebury.edu/flow or http://sintel.is.tue.mpg.de) Compute optical flow (spline-based or per-pixel) between two images, using one or more of the techniques described in the lecture.

a. (15 points) Implement the Lucas-Kanade algorithm (your code).
b. (25 points) Visualize the optical flow in two video sequences, provide some examples where Lucas-Kanade fails. Explain the reason in each case.

Answer:
a. function LK_OpticalFlow implement Lucas-Kande algorithm.
b. function display_opitical_flow shows 5 images(processed 6 frames) as the result. The output number can be adjusted in code.

The examples where Lucas-Kanade fails is shown in figure 1. The main reason is because of aperture problem. And this problem can be fixed by adding pyramid function to calcualte optical flow for each level.
figure 1
figure 1

The algorithm may fail if the movement is too big, or the light indensity chagne too much. But it does not show the failure on this example frame, because it did not run into the condition.

Q3. (20 points) Head detection:

You are asked to implement an algorithm that can detect head in images and videos. The algorithm should detect head regardless of viewing direction of the camera. What is your suggestion? Try to eliminate false positives as much as you can.

Answer

1) The first option I thought about is to use contour detection. The assumption is that the head is oval shape. If I can get contour first, then I can filter oval shape and I will get the head mask. I wrote a test program to detect the contour by using CV2.findContours. The code is in head.py and the function name is contour_detect. But the result does not work well, because the intensity are changing in the video and it could not easily get contour of a head. I gave up this option.
2) The next option I used is Haar cascade object detection method. Haar feature-based cascade classifiers is proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001. There are three types of Harr features are defined and the whole process requires a training in order to do object detection. In the code I am using custome trained haar_custom.xml features for the detection. The code is shown in next python code cell.

arcedit: The harr_custom is from https://github.com/Pramod-Devireddy/head_detection which use thouands of head images for the training. It covers front face, side, back and etc. A few samples are showing here.
head1head2head3head4
3) From my testing, I am using the custom trained harr_custom.xml, which includes front of head, back of head, side of head. The detection works well for front of face because it has more features to detect. And it does not work that well on other position of the head. To perfrom better result, I guess the Convolution Neural Network would pefor much better than this.

Result 1

1) The result from default parameters can be watched from Youtube. The url is https://youtu.be/EjcGXcOtiMw.

True Positive results:
true positive1 true positive2 true positive3

False Positive results:
False postive1 False postive2

Reduce false positive

After testing there are a few things impact on false positive results.

1) Harr features are capturing features changes patterns. For example, left eye, nose, right eye has a pattern darker,bright, darker. These patterns togethers represents the object features. In our case it is face,part of head. This is fundamental assumptions and it will happen if any pattern matches with the trained classifier. 2) There are two parameters used in detecting stage, scaleFactor=1.1 and minNeighbors=3. The minNeighbors impacts false positive and scaleFactor impact both searching performance and false positive result. After tuned, the minNeighbors is set to 5(default 3) and scaleFactor is set to 1.3(default 1.1). 3) Another behavior is that the image size, also has impact on the searching performance and false positive. In my code I resize the frame size to 1/3 of original to acquire better performance and lower false positive.

The video can be seen from YouTube and the url is https://youtu.be/-FnyaVMCaUk.

And two screenshots is shown here.
snapshot snapshot

Q3 Code

To run the code please make sure you have the video files comes with this project. The video file is under data folder.